Title UMI-tools: Modelling sequencing errors in Unique Molecular Identifiers to improve quantification accuracy Running Title Modelling UMI errors improves quantification accuracy Authors

نویسندگان

  • Tom Smith
  • Andreas Heger
  • Ian Sudbery
چکیده

Unique Molecular Identifiers (UMIs) are random oligonucleotide barcodes that are increasingly used in high-throughout sequencing experiments. Through a UMI, identical copies arising from distinct molecules can be distinguished from those arising through PCR amplification of the same molecule. However, bioinformatic methods to leverage the information from UMIs have yet to be formalised. In particular, sequencing errors in the UMI sequence are often ignored, or else resolved in an ad-hoc manner. We show that errors in the UMI sequence are common and introduce network-based methods to account for these errors when identifying PCR duplicates. Using these methods, we demonstrate improved quantification accuracy both under simulated conditions and in real iCLIP and single cell RNA-Seq datasets. Reproducibility between iCLIP replicates and single cell RNA-Seq clustering are both improved using our proposed network-based method, demonstrating the value of properly accounting for errors in UMIs. These methods are implemented in the open source UMItools software package (https://github.com/CGATOxford/UMI-tools). . CC-BY 4.0 International license not peer-reviewed) is the author/funder. It is made available under a The copyright holder for this preprint (which was . http://dx.doi.org/10.1101/051755 doi: bioRxiv preprint first posted online May. 9, 2016;

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Incorporation of unique molecular identifiers in TruSeq adapters improves the accuracy of quantitative sequencing.

Quantitative analysis of next-generation sequencing (NGS) data requires discriminating duplicate reads generated by PCR from identical molecules that are of unique origin. Typically, PCR duplicates are identified as sequence reads that align to the same genomic coordinates using reference-based alignment. However, identical molecules can be independently generated during library preparation. Mi...

متن کامل

The impact of amplification on differential expression analyses by RNA-seq

Currently, quantitative RNA-seq methods are pushed to work with increasingly small starting amounts of RNA that require amplification. However, it is unclear how much noise or bias amplification introduces and how this affects precision and accuracy of RNA quantification. To assess the effects of amplification, reads that originated from the same RNA molecule (PCR-duplicates) need to be identif...

متن کامل

A cost effective 5΄ selective single cell transcriptome profiling approach with improved UMI design

Single cell RNA sequencing approaches are instrumental in studies of cell-to-cell variability. 5΄ selective transcriptome profiling approaches allow simultaneous definition of the transcription start size and have advantages over 3΄ selective approaches which just provide internal sequences close to the 3΄ end. The only currently existing 5΄ selective approach requires costly and labor intensiv...

متن کامل

بررسی وضعیت صحت مقالات استنادی پایان نامه‌های دوره‌های دکترای تخصصی پزشکی دانشگاه علوم پزشکی تهران

Background and Aim: Citation could be considered as basis of scientific researches. Each researcher will use citation to prove his scientific findings either to be in correspondence with truth or to familiarize readers with more references. Maintenance and continuation of informational link by citation is essential. Theses are not exceptional for this subject. This study was done to review the ...

متن کامل

The impact of amplification on di↵erential expression analyses by RNA-seq

Correspondence: [email protected] Anthropology and Human Genomics, Department of Biology II, Ludwig Maximilians University Munich, Grosshaderner Str. 2, D-82152 Martinsried, Germany Full list of author information is available at the end of the article Abstract Background Currently quantitative RNA-Seq methods are pushed to work with increasingly small starting amounts of RNA that require PCR ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016